psycopg2.OperationalError: server closed the connection unexpectedly #829

RekGRpth · 2018-12-21T11:17:28Z

this code

sql = 'select pg_backend_pid()'

class DAL():
    def __init__(self, uri):
        from psycopg2 import connect
        #from pg8000 import connect
        self.cursor = connect(user='postgres', host='postgres').cursor()

    def executesql(self, sql):
        self.cursor.execute(sql)
        return self.cursor.fetchall()

def thread():
    db = DAL('')
    i = 10
    import time
    from pprint import pprint
    while i > 0:
        pprint(dict(function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
        time.sleep(1)
        i -= 1

def process():
    from pprint import pprint
    pprint(dict(function='process:begin'))
    import time
    time.sleep(2)
    pprint(dict(function='process:end'))

def main():
    from pprint import pprint
    pprint(dict(function='main:begin'))
    import threading
    threading.Thread(target=thread, name='mythread').start()
    import time
    time.sleep(2)
    import multiprocessing
    multiprocessing.Process(target=process, name='myprocess').start()
    pprint(dict(function='main:end'))

if __name__ == '__main__': main()

raises following exception

Exception in thread mythread:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "test3.py", line 19, in thread
    pprint(dict(function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
  File "test3.py", line 10, in executesql
    self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

but with pg8000 everething is ok

RekGRpth · 2018-12-21T11:22:33Z

also, with psycopg2 on python2 everething is ok

dvarrazzo · 2018-12-21T18:06:17Z

try keeping the connection alive. instead of:

    self.cursor = connect(user='postgres', host='postgres').cursor()

try:

    self.connection = connect(user='postgres', host='postgres')
    self.cursor = self.connection.cursor()

RekGRpth · 2018-12-22T02:09:36Z

result is same

dimandzhi · 2019-01-24T10:19:36Z

I'm having the same issue. For now as a workaround had to create wrapper around connection object with used methods to hook them. In cursor() method had to do a SELECT 1 "ping" check and reconnect if needed to return a valid working cursor.

dvarrazzo · 2019-03-17T22:22:26Z

I've played a bit with this issue, which is reproducible. Using the following script:

import time
import threading
import multiprocessing
from pprint import pprint
from psycopg2 import connect

sql = 'select pg_backend_pid()'


class DAL():
    def __init__(self, uri):
        self.conn = connect("")
        self.cursor = self.conn.cursor()

    def executesql(self, sql):
        self.cursor.execute(sql)
        return self.cursor.fetchall()


def thread():
    db = DAL('')
    i = 3
    while i > 0:
        pprint(dict(
            function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
        time.sleep(1)
        i -= 1


def process():
    pprint(dict(function='process:begin'))
    time.sleep(2)
    pprint(dict(function='process:end'))


def main():
    pprint(dict(function='main:begin'))
    t = threading.Thread(target=thread, name='mythread')
    t.start()
    time.sleep(1)
    multiprocessing.Process(target=process, name='myprocess').start()
    t.join()
    pprint(dict(function='main:end'))


if __name__ == '__main__':
    main()

The script has the following output, on Py 2.7 and 3.6:

$ PYTHONPATH=build/lib.2.7/ python ./test-829.py 
{'function': 'main:begin'}
{'executesql': '16892', 'function': 'thread:run'}
{'function': 'process:begin'}
{'executesql': '16892', 'function': 'thread:run'}
{'executesql': '16892', 'function': 'thread:run'}
{'function': 'process:end'}
{'function': 'main:end'}

$ PYTHONPATH=build/lib.3.6/ python3 ./test-829.py 
{'function': 'main:begin'}
{'executesql': '16956', 'function': 'thread:run'}
{'executesql': '16956', 'function': 'thread:run'}
{'function': 'process:begin'}
Exception in thread mythread:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "./test-829.py", line 25, in thread
    function='thread:run', executesql='%s' % db.executesql(sql)[0][0]))
  File "./test-829.py", line 16, in executesql
    self.cursor.execute(sql)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


{'function': 'main:end'}
{'function': 'process:end'}

Saving the output of strace in two different files:

PYTHONPATH=build/lib.2.7/ strace -f python -u ./test-829.py > out-829-py2 2>&1
PYTHONPATH=build/lib.3.6/ strace -f python3 -u ./test-829.py > out-829-py3 2>&1

after making the pid numbers uniform in the files (P0, P1, P2), extracting the relevant part (omitting what's before the connection to the db):

python 2 strace

$ egrep '(= 3$|sendto|close|clone)' out-829-py2
clone(child_stack=0x7fd47e85afb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7fd47e85b9d0, tls=0x7fd47e85b700, child_tidptr=0x7fd47e85b9d0) = P1
[...]
[pid  P1] socket(AF_UNIX, SOCK_STREAM, 0) = 3
[pid  P1] sendto(3, "\0\0\0?\0\3\0\0user\0piro\0database\0piro\0"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
[pid  P1] sendto(3, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P0] close(6)                    = 0
[pid  P0] close(4)                    = 0
[pid  P0] clone( <unfinished ...>
[pid  P0] <... clone resumed> child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd485bfda10) = P2
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P1] sendto(3, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
[pid  P1] close(3 <unfinished ...>
[pid  P1] <... close resumed> )       = 0
close(5)                                = 0

python 3 strace

$ egrep '(= 3$|sendto|close|clone)' out-829-py3
clone(child_stack=0x7f4662f92fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f4662f939d0, tls=0x7f4662f93700, child_tidptr=0x7f4662f939d0) = P1
[...]
[pid  P1] socket(AF_UNIX, SOCK_STREAM, 0) = 3
[pid  P1] sendto(3, "\0\0\0?\0\3\0\0user\0piro\0database\0piro\0"..., 63, MSG_NOSIGNAL, NULL, 0) = 63
[pid  P1] sendto(3, "Q\0\0\0\nBEGIN\0", 11, MSG_NOSIGNAL, NULL, 0) = 11
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = 29
[pid  P0] close(4)                    = 0
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
[pid  P1] <... sendto resumed> )      = 29
[pid  P0] close(4)                    = 0
[pid  P0] close(4)                    = 0
[pid  P0] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f466a434a10) = P2
[pid  P0] close(5)                    = 0
[pid  P2] sendto(3, "X\0\0\0\4", 5, MSG_NOSIGNAL, NULL, 0) = 5
[pid  P2] close(3)                    = 0
[pid  P2] close(4)                    = 0
[pid  P2] openat(AT_FDCWD, "/dev/null", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] sendto(3, "Q\0\0\0\34select pg_backend_pid()\0", 29, MSG_NOSIGNAL, NULL, 0) = -1 EPIPE (Broken pipe)
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "/usr/lib/python3.6/threading.py", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "./test-829.py", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
psycopg2.OperationalError: server closed the connection unexpectedly
[pid  P1] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0
[pid  P1] openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
[pid  P1] close(3)                    = 0

I see here that on Python 3, the process P2, politely, closes the connection belonging to P1 (the X command in the sendto() call) 😳

Similar behaviour can be seen in a run with debug enabled: on Python 3 the trace would be something like:

[P1] pq_execute: executing SYNC query: pgconn = 0xP1
[P1]     select pg_backend_pid()
[P1] pq_execute: entering synchronous DBAPI compatibility mode
[P1] pq_fetch: pgstatus = PGRES_TUPLES_OK
[P1] pq_fetch: got tuples
[P1] _pq_fetch_tuples: looking for cast 23:
[P1] curs_get_cast:        per-connection dict: (nil)
[P1] curs_get_cast:        global dict: 0x..
[P1] _pq_fetch_tuples: using cast at 0x.. for type 23
[P1] psyco_curs_execute: res = 1, pgres = 0xP2
[P2] conn_close: PQfinish called
[P1] _psyco_curs_prefetch: result = 0
[P2] connection_dealloc: deleted connection object at 0x7f9e15f28180, refcnt = 0
[P1] _psyco_curs_buildrow: row 0, element 0, len 5
[P2] cursor_dealloc: deleted cursor object at 0x7f9e161069f8, refcnt = 0
[P1] _psyco_curs_buildrow: val->refcnt = 1
{'executesql': 'P3', 'function': 'thread:run'}
{'function': 'process:begin'}
[P1] psyco_curs_execute: starting execution of new query
[P1] pq_execute: pg connection at 0xP1 OK
[P1] pq_begin_locked: pgconn = 0xP1, autocommit = 0, status = 2
[P1] pq_begin_locked: transaction in progress
[P1] pq_execute: executing SYNC query: pgconn = 0xP1
[P1]     select pg_backend_pid()
[P1] psyco_curs_execute: res = -1, pgres = (nil)

So, on Py3 it seems the objects undergo GC, which for the connection sends a close command to an open FD.

Maybe the thing can be fixed by making sure deleting the object doesn't call PQfinish if the pid is not the one the connection was created into.

Unrelated processes close the FD of the connection. This happens in Python 3.6 but not 2.7. Let's see if travis shows where else it fails...

leongold · 2019-10-08T20:34:25Z

Still hitting this issue @ 2.8.3... heavy multiprocessing use. Any particular info you'd like? Any hints? will temporarily disabling GC help?

dvarrazzo · 2019-10-09T10:30:03Z

@leongold This bug is fixed. You are not hitting this bug, maybe another one. Please provide a test to reproduce your problem, thank you.

yshahak · 2019-10-10T07:20:31Z

I'm getting this as well on 2.8.3. I'm doing multiprocessing of insertions. I have about 4000 to do and at about 600 I'm getting this:
psycopg2.OperationalError: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

dvarrazzo · 2019-10-10T10:42:26Z

@yshahak This might even mean that your server closed the connection. Did you check in the server logs if that's what happened?

This bug is about an issue for which on python 3 processes closed each other's connections. If you produce me an example demonstrating it's the client fault, then we can do something. Missing it I assume it's exactly what the message says: the server closed your connection and you can't do much about it client side.

leongold · 2019-10-10T10:56:20Z

@dvarrazzo I'm not entirely sure whether the test case fits exactly. I'll try to find the time to write my own test case, however in the meanwhile:

I instantiate a connection object early in the subprocess' lifecycle. It doesn't do anything for a while.
When it's time to INSERT, I hit the aforementioned error (server closed the connection unexpectedly).
As a workaround, I re-instantiate a connection object before it's time to INSERT.

I'm not sure what causes this, but what's apparent to me is that living an idle for too long causes it to silently timeout/disconnect.

In your reply to @yshahak you've mentioned it might be indeed the server closing the client's connection; I don't believe that is the case. I believe the connection is already closed by that time.

dvarrazzo · 2019-10-10T11:38:27Z

@leongold remember that you cannot share connections across processes. If you open a connection, fork, and execute it from the forked process, it won't work.

leongold · 2019-10-10T11:43:19Z

@dvarrazzo each subprocess instantiates its own connection... which also leads me to believe subprocessing isn't relevant here... could it be there's simply an issue of idle connections silently disconnecting?

edit:

server logs has a bunch of:

2019-10-09 00:10:49.762 UTC [2439] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:49.762 UTC [2439] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:49.799 UTC [2436] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:49.799 UTC [2436] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.067 UTC [2447] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.067 UTC [2447] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.476 UTC [2445] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.476 UTC [2445] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.553 UTC [2441] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.553 UTC [2441] LOG:  unexpected EOF on client connection with an open transaction
2019-10-09 00:10:50.738 UTC [2448] LOG:  could not receive data from client: Connection reset by peer
2019-10-09 00:10:50.738 UTC [2448] LOG:  unexpected EOF on client connection with an open transaction

edit 2:

another possible important tidbit: my architecture was working flawlessly for months; this started to occur when the time period between connection instantiation to first INSERT grew by a couple of minutes.

and, as aformentioned, simply creating a new connection before INSERT'ing works around it.

rahulhacker · 2019-12-16T05:06:24Z

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Can someone help me out on this?

NayangiriGoswamiGit · 2019-12-23T14:39:53Z

This bug is about an issue for which on python 3 processes closed each other's connections. Maybe I'm facing this issue while uploading bulk data using multiprocessing library. I'm on psycopg2 version 2.8. is any one knows the solution here ?

AndyMender · 2020-01-22T11:19:52Z

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Can someone help me out on this?

I encountered this while using my SQLAlchemy connector wrappers. The high-level cause was too much data in INSERT queries.

Since one typically doesn't have access to SQLAlchemy internals, what @dimandzhi recommended with SELECT 1 pings is a good idea. Or chunk your data into smaller blocks. It will be slower (more database calls), but more resistant to connection disruption of any sort.

dvarrazzo · 2020-01-22T12:48:16Z

Or just to your server being configured with a statement_timeout you are hitting

dvarrazzo added a commit that referenced this issue Mar 17, 2019

Added test to reproduce ticket #829

7c5afd6

Unrelated processes close the FD of the connection. This happens in Python 3.6 but not 2.7. Let's see if travis shows where else it fails...

dvarrazzo closed this as completed in f8f5a77 Mar 18, 2019

dvarrazzo mentioned this issue Apr 7, 2019

upgrade from 2.7.7 to 2.8.1 leaves connections open in idle state #889

Closed

psycopg locked as resolved and limited conversation to collaborators Feb 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

psycopg2.OperationalError: server closed the connection unexpectedly #829

psycopg2.OperationalError: server closed the connection unexpectedly #829

RekGRpth commented Dec 21, 2018

RekGRpth commented Dec 21, 2018 •

edited

Loading

Uh oh!

dvarrazzo commented Dec 21, 2018

Uh oh!

RekGRpth commented Dec 22, 2018

Uh oh!

dimandzhi commented Jan 24, 2019

Uh oh!

dvarrazzo commented Mar 17, 2019

Uh oh!

leongold commented Oct 8, 2019

Uh oh!

dvarrazzo commented Oct 9, 2019

Uh oh!

yshahak commented Oct 10, 2019

Uh oh!

dvarrazzo commented Oct 10, 2019

Uh oh!

leongold commented Oct 10, 2019

Uh oh!

dvarrazzo commented Oct 10, 2019

Uh oh!

leongold commented Oct 10, 2019 •

edited

Loading

Uh oh!

rahulhacker commented Dec 16, 2019

Uh oh!

NayangiriGoswamiGit commented Dec 23, 2019

Uh oh!

AndyMender commented Jan 22, 2020

Uh oh!

dvarrazzo commented Jan 22, 2020

Uh oh!

Uh oh!

psycopg2.OperationalError: server closed the connection unexpectedly #829

psycopg2.OperationalError: server closed the connection unexpectedly #829

Comments

RekGRpth commented Dec 21, 2018

RekGRpth commented Dec 21, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dvarrazzo commented Dec 21, 2018

Uh oh!

RekGRpth commented Dec 22, 2018

Uh oh!

dimandzhi commented Jan 24, 2019

Uh oh!

dvarrazzo commented Mar 17, 2019

Uh oh!

leongold commented Oct 8, 2019

Uh oh!

dvarrazzo commented Oct 9, 2019

Uh oh!

yshahak commented Oct 10, 2019

Uh oh!

dvarrazzo commented Oct 10, 2019

Uh oh!

leongold commented Oct 10, 2019

Uh oh!

dvarrazzo commented Oct 10, 2019

Uh oh!

leongold commented Oct 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rahulhacker commented Dec 16, 2019

Uh oh!

NayangiriGoswamiGit commented Dec 23, 2019

Uh oh!

AndyMender commented Jan 22, 2020

Uh oh!

dvarrazzo commented Jan 22, 2020

Uh oh!

RekGRpth commented Dec 21, 2018 •

edited

Loading

leongold commented Oct 10, 2019 •

edited

Loading